Active Learning with Scarcely Labeled Data via Bias Variance Reduction
نویسنده
چکیده
In many occasions in real life, we are faced with the problem of classification of partially labeled data, or semi-supervised learning. We consider the special case of scarcely labeled data or when the labeled data is insufficient, and present a principled method which implements active learning in scarcely labeled data to enhance the performance of the learner. This method is based on the recent bias variance decomposition work for a 0-1 loss function. We focus on bias and variance reduction to reduce 0-1 loss by first selecting a random pool from the unlabeled data, and then using the mostinformative instances from that pool to reduce the variance, bias, and thereby overall loss of the learner via active learning. Our empirical results show that this technique can decrease the loss of the learner significantly.
منابع مشابه
Active Learning with Partially Labeled Data via Bias Reduction
With active learning the learner participates in the process of selecting instances so as to speed-up convergence to the “best” model. This paper presents a principled method of instance selection based on the recent bias variance decomposition work for a 0-1 loss function. We focus on bias reduction to reduce 0-1 loss by using an approximation to the optimal Bayes classifier to calculate the b...
متن کاملCorrecting Sampling Bias in Structural Genomics through Iterative Selection of Underrepresented Targets
In this study we proposed an iterative procedure for correcting sampling bias in labeled datasets for supervised learning applications. Given a much larger and unbiased unlabeled dataset, our approach relies on training contrast classifiers to iteratively select unlabeled examples most highly underrepresented in the labeled dataset. Once labeled, these examples could greatly reduce the sampling...
متن کاملSemi-Supervised, Dimensionality Reduction via Canonical Correlation Analysis
We analyze the multi-view regression problemwhere we have two views (X1, X2) of the input data and a real target variable Y of interest. In a semi-supervised learning setting, we consider two separate assumptions (one based on redundancy and the other based on (de)correlation) and show how, under either assumption alone, dimensionality reduction (based on CCA) could reduce the labeled sample co...
متن کاملWeighted Proportional k-Interval Discretization for Naive-Bayes Classifiers
The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analys...
متن کاملIncremental Active Learning in Consideration of Bias
The problem of designing input signals for optimal generalization in supervised learning is called active learning. In many active learning methods devised so far, the sampling location minimizing the variance of the learning results is selected. This implies that the bias of the learning results is assumed to be zero or small enough to be neglected. In this paper, we propose an active learning...
متن کامل